Experiments on the zero frequency problem

نویسندگان

  • John G. Cleary
  • W. J. Teahan
چکیده

The best algorithms for lossless compression of text are those which adapt to the text being compressed [1]. Two classes of such adaptive techniques are commonly used. One class matches the text against a dictionary of strings seen and transforms the text into a list of indices into the dictionary. These techniques are usually formulated as a variant on Ziv-Lempel (LZ) compression. While LZ compressors do not give the best compression they are widely used because of their simplicity and low execution overhead. The best compression is obtained by another class of compressors which use adaptive statistical modelling. These split compression into two steps. The rst step accumulates a statistical model of the characters seen so far in the input text. As each character is encoded this model is used to generate a probability distribution over those characters which can occur next. Arithmetic coding is then used to optimally encode the character which actually does occur with respect to this distribution. The best compression has been obtained from a series of variants of PPM modelling [1]. PPM models are built up by counting the characters that have occurred following contexts of prior characters. For example, all the characters following `a' are recorded. The next time `a' occurs the counts associated with it are used to generate the probability distribution for the following character. The PPM techniques blend together the predictions from contexts of varying lengths to arrive at an overall probability distribution. For practical reasons of memory usage and execution time most PPM variants x an upper bound to the lengths of the contexts, although recently a variant which uses unbounded length contexts has been very successful [2]. The focus of this paper is the problem of transforming the set of counts accumulated for a particular context into a probability distribution. To simplify our discussion and later experiments we will focus on the case when the alphabet of characters is binary with just two symbols: 0 and 1. Now in a statistical model each context will deliver two counts: C0, the number of times a 0 has occurred, and C1, the number of times a 1 has occurred. A naive estimate of the probability of character i could be obtained by the ratio

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-Stage Fuzzy Load Frequency Control Based on Multi-objective Harmony Search Algorithm in Deregulated Environment

A new Multi-Stage Fuzzy (MSF) controller based on Multi-objective Harmony Search Algorithm (MOHSA) is proposed in this paper to solve the Load Frequency Control (LFC) problem of power systems in deregulated environment. LFC problem are caused by load perturbations, which continuously disturb the normal operation of power system. The objectives of LFC are to mini small size the transient deviati...

متن کامل

On Interaction of T S Waves and 3 D Localized Disturbance in a Divergent Flow Under Zero Pressure Gradient

To simulate the effect of free st ream turbulence on turbulent spot formation, experiments were conducted on the interaction of localized three-dimensional disturbances with the harmonic waves in a laminar boundary layer on a flat plate. Experiments conducted in three-dimensional diverging flow (but zero pressure gradient) show, while individually the disturbances decay downstream, their intera...

متن کامل

Fixing of Cycle Slips in Dual-Frequency GPS Phase Observables using Discrete Wavelet Transforms

The occurrence of cycle slips is a major limiting factor for achievement of sub-decimeter accuracy in positioning with GPS (Global Positioning System). In the past, several authors introduced a method based on different combinations of GPS data together with Kalman filter to solve the problem of the cycle slips. In this paper the same philosophy is used but with discrete wavelet transforms. For...

متن کامل

A Basic Period Approach for Solving the Economic Lot and Delivery Scheduling in Flexible Flow Lines

In this paper, the problem of lot sizing, scheduling and delivery of several items in a two-stage supply chain over a finite planning horizon is studied. Single supplier via a flexible flow line production system (FFL), produces several items and delivers them directly to an assembly facility. Based on basic period (BP) strategy, a new mixed zero-one nonlinear programming model has been develop...

متن کامل

Fuzzy Forcing Set on Fuzzy Graphs

The investigation of impact of fuzzy sets on zero forcing set is the main aim of this paper. According to this, results lead us to a new concept which we introduce it as Fuzzy Zero Forcing Set (FZFS). We propose this concept and suggest a polynomial time algorithm to construct FZFS. Further more we compute the propagation time of FZFS on fuzzy graphs. This concept can be more efficient to model...

متن کامل

Compensation of Doppler Effect in Direct Acquisition of Global Positioning System using Segmented Zero Padding

Because of the very high chip rate of global positioning system (GPS), P-code acquisition at GPS receiver will be challenging. A variety of methods for increasing the probability of detection and reducing the average time of acquisition have been provided, among which the method of Zero Padding (ZP) is the most essential and the most widely used. The method using the Fast Fourier Transform (FFT...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995